Principal Component Analysis on non-Gaussian Dependent Data
نویسندگان
چکیده
In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes (m-dependency or a more general φ-mixing case). We show that COCA can allow weak dependence. In particular, we provide the generalization bounds of convergence for both support recovery and parameter estimation of COCA for the dependent data. We provide explicit sufficient conditions on the degree of dependence, under which the parametric rate can be maintained. To our knowledge, this is the first work analyzing the theoretical performance of PCA for the dependent data in high dimensional settings. Our results strictly generalize the analysis in Han & Liu (2012) and the techniques we used have the separate interest for analyzing a variety of other multivariate statistical methods.
منابع مشابه
Gaussian Process Regression for Multivariate Spectroscopic Calibration
Traditionally multivariate calibration models have been developed using regression based techniques including principal component regression and partial least squares and their non-linear counterparts. This paper proposes the application of Gaussian process regression as an alternative method for the development of a calibration model. By formulating the regression problem in a probabilistic fr...
متن کاملModelling sparse generalized longitudinal observations with latent Gaussian processes
In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time. The repeated observations could be binomial, Poisson or of another discrete type or could be continuous.The timings of the repeated measurements are often sparse and irregular. We introduce a latent Gaussian process model for such data, establishing a co...
متن کاملPrincipal Cumulant Component Analysis
Multivariate Gaussian data is completely characterized by its mean and covariance, yet modern non-Gaussian data makes higher-order statistics such as cumulants inevitable. For univariate data, the third and fourth scalar-valued cumulants are relatively well-studied as skewness and kurtosis. For multivariate data, these cumulants are tensor-valued, higher-order analogs of the covariance matrix c...
متن کاملGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior’s covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less ...
متن کاملMultivariate Analysis and Monitoring of Sequencing Batch Reactor Using Multiway Independent Component Analysis
This contribution describes the monitoring on a pilot-scale sequencing batch reactor (SBR) using a batchwise multiway independent component analysis method (MICA) which can extract meaningful hidden information from non-Gaussian data. Given that independent component analysis (ICA) is superior to principal component analysis (PCA) to extract features from non-Gaussian data sets, the use of ICA ...
متن کامل